Skip to content
This repository was archived by the owner on Aug 30, 2022. It is now read-only.

Conversation

@finiteprods
Copy link
Contributor

@finiteprods finiteprods commented Nov 13, 2020

Warning. The changes here are meant just for illustrative PoC purposes, and should not be merged.

This branch contains some test code to prove that it is possible, in principle, to support an alternative aggregation to "federated averaging" in the PET protocol. The particular example illustrated here is a histogram aggregation. A quick breakdown of the experiment:

  • the coordinator and participants are assumed to be in agreement about the ranges of values in the histogram, e.g. 0 - 5, 5 - 10, 10 - 15, 15 - 20.
  • the test-drive is used to simulate clients providing a "measurement" in one of those ranges, e.g. for a measurement of 7.5, this falls into the 5 - 10 range. To convey this range to the coordinator, it would construct a "model" [0, 1, 0, 0].
  • the PET protocol runs as usual (not completely true - explained later), so that the coordinator learns none of the individual models, but still unmasks their overall sum. From this, the coordinator has the histogram.

In a particular test run of this, spinning up a coordinator and running the test-drive with -n 10, I observed

  • 7 update participants computed a masked model. Of these:
  • 2 sent [1, 0, 0, 0]
  • 3 sent [0, 1, 0, 0]
  • 2 sent [0, 0, 0, 1]

On the coordinator, the unmasked model and histogram is visible in the console output:

histogram for [2, 3, 0, 2]

    *        
*   *       *
*   *       *
*************

Further remarks.

  • with some refinements to the histogram protocol above (not shown here), it is possible to compute other kinds of aggregations, such as a maximum or minimum. This is still done in a privacy-preserving way (whether these cover a reasonable amount of our mobile analytics use cases, is still to be investigated).
  • how one parametrises the aggregation, e.g. switch between averaging or histogram or something else, is something to be considered elsewhere.
  • the test-drive still uses dummy models (just in this slightly different form). In the real implementation, a client would construct a model based on some value accessible on the device.
  • as mentioned above, every party was assumed to know the ranges of the histogram upfront. In practice, this would need to be communicated somehow - clients need this information to compute their models.
  • a small modification was made to the unmasking part of the PET protocol - we skip the "correction" step which re-scales the unmasked vector. The reason is because we would like a straightforward sum of the models, rather than a weighted average.
  • (minor point) to print the mini-histogram, I used a tiny crate hist. While the above output looks sensible, my mileage varied a lot! it behaves a little peculiarly, sometimes adding / removing a point. It's not documented so perhaps I'm misusing it.

@finiteprods
Copy link
Contributor Author

closing as this is superseded by #635

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants